Domain Specific Embedded Compilers 


Daan Leijen and Erik Meijer 
University of Utrecht 
Department of Computer Science 
POBox 80.089, 3508 TB Utrecht, The Netherlands 

{daan, erik}@cs.uu.nl, http://www.cs.uu.nl/~ldaan, erik} 


Abstract 


Domain-specific embedded languages (DSELs) ex¬ 
pressed in higher-order, typed (HOT) languages pro¬ 
vide a composable framework for domain-specific ab¬ 
stractions. Such a framework is of greater util¬ 
ity than a collection of stand-alone domain-specific 
languages. Usually, embedded domain specific lan¬ 
guages are build on top of a set of domain spe¬ 
cific primitive functions that are ultimately imple¬ 
mented using some form of foreign function call. 
We sketch a general design pattern for embedding 
client-server style services into Haskell using a do¬ 
main specific embedded compiler for the server’s 
source language. In particular we apply this idea 
to implement Haskell/DB, a domain specific em- 
bdedded compiler that dynamically generates of SQL 
queries from monad comprehensions, which are then 
executed on an arbitrary ODBC database server. 


1 Introduction 


Databases are ubiquitous in computer science. For 
instance, a web site is usually a fancy facade in 
front of a conventional database, which makes the 
information available in a convenient browsable 
form. Sometimes, servers are even running directly 
on a database engine that generates pages from 
database records on-the-fly. Hence is is not sur¬ 
prising that database vendors provide hooks that 
enable client applications to access and manipulate 
their database servers. On Unix platforms this is 
usually done via ODBC, under Windows there are 
confusingly many possibilities such as ADO, OLE 
DB, and ODBC. 

What is common to all the above database bindings 
is that queries are communicated to the database 


as unstructured strings (usually) representing SQL 
expressions. This low-level approach has many dis¬ 
advantages: 

• Programmers get no (static) safeguards against 
creating syntactically incorrect or ill-typed 
queries, which can lead to hard to find runtime 
errors. 

• Programmers have to distinguish between at 
least two different languages, SQL and the 
scripting language that generates the queries 
and submits them to the database engine 
(Perl, Visual Basic). This makes programming 
needlesly complex. 

• Programmers are exposed to the accidental 
complexity and idiosyncrasies of the particular 
database binding, making code harder to write 
and less robust against the vendor’s fads [1]. 

We argue that domain-specific embedded languages 
[9] (DSELs) expressed in higher-order, typed (HOT) 
languages, Haskell [10] in our case, provide a com¬ 
posable framework for domain-specific abstractions 
that is of greater utility than a collection of stand¬ 
alone domain-specific languages: 

• Programmers have to learn only one language, 
domain specific abstractions are exposed to the 
host language as extension libraries. 

• In many cases it is possible to present libraries 
using a convenient domain specific syntax. 

• It is nearly always 1 possible to guarantee that 
programmers can only produce syntactically 
correct target programs, and in many cases we 
are able to impose domain specific typing rules. 

x For instance _L is a value of every type in Haskell, so can¬ 
not prevent programmers from producing infinite or partially 
defined values. 







• Programmers can seamlessly integrate with 
other domain specific libraries (e.g. CGI, mail), 
which are accessible in the same way as any 
other library. This is a largely underestimated 
benefit of using the embedded approach. Con¬ 
necting different domain specific languages to¬ 
gether is usually quite difficult. 

• Programmers can leverage on existing language 
infrastructure such as the module and type sys¬ 
tem and the built-in abstraction mechanisms. 

Note that the ideas underlying our thesis date way 
back to 1966 when Peter Landin [12] already ob¬ 
served that all programming languages comprise a 
domain independent linguistic framework and a do¬ 
main dependent set of components. What is new in 
this paper is that we show how to embed the terms 
and the type system of another (domain specific) 
programming language into the Haskell framework, 
which dynamically compiles and executes programs 
written in the embedded language. Moreover, no 
changes to the syntax or additions of primitives were 
needed to embed the language in Haskell. 


1.1 Overview 


We begin by giving a minuscule introduction to 
Haskell and a crash course in relational databases, 
and we show how a typical Visual Basic and a 
typical Haskell program would access a relational 
database. Next we sketch a general design pat¬ 
tern for term- and type-safe embedding client-server 
style services into Haskell using an evaluator for a 
subset of SQL expressions as an example server. We 
then turn our attention to the more challenging task 
of embedding a database server in Haskell. Section 7 
contrasts the Haskell and Visual Basic implementa¬ 
tions of a example web-page that generates HTML 
from a database of exam marks. We finish with 
some conclusions and some ideas for future work. 


2 Minuscule introduction to Haskell 


The main virtue of a functional language is that 
functions are first-class citizens that can be stored 
in lists, or passed as arguments to and returned 
from other functions. To emphasize the fact that 
functions of type a -> b have the same status as 


any other kind of value, we usually write them 
as lambda-expressions f = \a -> b instead of the 
more common Haskellish notation f a = b. 

Function application is given by juxtaposition in 
Haskell and associates to the left. Thus when we 
have the three argument function line: 

line = \a -> \b -> \x -> a*x + b 

the application line 2 3 denotes the single argu¬ 
ment function \x -> 2*x + 3. The type of the 
line function can be specified explicitly: 

line :: Int -> Int -> Int -> Int 

The type shows that line takes three Int argu¬ 
ments and returns an Int. 

Case expressions are used to define functions by case 
distinction, for instance the factorial function can be 
defined in Haskell as 


fac :: Num a => a -> a 

case n of 

{ 0 -> 1 

; n -> n * fac (n-1) 

I 

In Haskell, polymorphic types can be constrained 
by means of type contexts. The given type for the 
factorial function fac : : Num a=>a->a->a 
says that function fac has type a -> a -> a for all 
types a that are instances of the Num class. Unsur¬ 
prisingly it is the case that Num Int, Num Float and 
Num Double are all true. 

We will represent database rows by extensible 
records, an experimental feature that is cur¬ 
rently only supported by the the TREX ex¬ 
tension of the Hugs implementation of Haskell 
[8]. A record is nothing more than an asso¬ 
ciation list of field-value pairs. For instance 
the record (x = 3, even = False) :: Rec (x 
: : Int, even : : Bool) has two fields, x of type 
Int and even of type Bool. A record of type Rec r 
can be extended by a field z provided that z does not 
already occur in r. The fact that record r should 
lack field z is indicated by the constraint r\z, thus 
the type of a function that adds an f oo field to a 
record becomes: 


extendWithFoo :: r\foo => 



a -> Rec r -> Rec (foo :: a I r) 

extendWithFoo = \a -> \r -> (foo = a I r) 

Unfortunately, labels are not (yet) first class values 
in TREX, so we cannot write a generic function that 
extends a given record with a new field: 

— WRONG 

extendWith = \(f,a) -> \r -> (f = a I r) 

In our case the lack of first class labels means that 
we have to repeat a lot of code that only differs in 
the names of some labels. Another deficiency of the 
current implementation of TREX records is the fact 
that it is impossible to formulate a constraint on all 
the values in a given record, for instance, we would 
like to constrain a record to contain only values on 
which equality is defined. Currently, there is just 
one built-in constraint ShowRecRow r that indicates 
that all values in row r are in the Show class. 

When interacting with the outside world or access¬ 
ing object models, we have to deal with side-effects. 
In Haskell, effectful computations live in the 10 
monad [15]. A value of type 10 a is a latently ef¬ 
fectful computation that, when executed , will pro¬ 
duce a value of type a. For instance, the command 
getChar : : 10 Char will read a character from the 
standard input when it is executed. 

Like functions, latently effectful computations are 
first class citizens that can be passed as arguments, 
stored in list, and returned as results. For exam¬ 
ple putChar : : Char -> 10 () is a function that 
takes a character and then returns a computation 
that, when executed, will print that character on 
the standard output. 

Effectful computations can be composed 
using the do{}-notation. The command 

dof a <- ma; f a > : : 10 b is a latent com¬ 
putation, that, when executed, first executes the 
command ma : : 10 a to obtain a value a : : a, 
passes that to the action-producing function 
f : : a -> 10 b and then executes (f a) to 
obtain a value of type 10 b. For example, when 
executed, the command: 

do{ c <- getChar 
; putChar c 
} 

reads a character from the standard input and 
copies it to the standard output. 


The usefulness of monads goes far beyond in¬ 
put/output, many other type constructors are mon¬ 
ads as well. In section 6 we will define the Query 
monad that allows us to write queries using the 
same do-Q notation that we introduced here for IO- 
computations. 

In this paper we adopt style conventions that em¬ 
phasize when we are dealing with effectful com¬ 
putations. Specifically, all expressions of monadic 
type (such as 10 and Query) are written with 
an explicit do{}. To reflect the influence of the 
OO style, we will use postfix function application 
object#method = method object to mimic the 
object .method notation. Together with the con¬ 
vention for writing functions as lambda-expressions, 
this results in highly stylized programs from which 
it is easy to tell the type of an expression by looking 
at its syntactic shape. 


3 A crash course in relational 
databases 


In a relational database [5], data is represented 
as sets of tuples. For example take the following 
database Rogerson of objects and some of their 
properties [17]: 


Object 

Edible 

Inheritance 

President 

Rich people 

False 

False 

True 

Bean plants 

True 

False 

False 

CORBA 

False 

True 

False 

COM 

False 

False 

False 


We can conclude from this table that bean plants are 
edible, and that rich people can run for president. 
We can query the database more systematically us¬ 
ing relational algebra. 

The selection operator a specifies a subset of rows 
whose attributes satisfy some property. For example 
we can eliminate all entries for objects that can run 
for president from the database using the expression 

(TPresident=False RogerSOU: 


Object 

Edible 

Inheritance 

President 

Bean plants 

True 

False 

False 

CORBA 

False 

True 

False 

COM 

False 

False 

False 






The projection operator 7r specifies a subset of the 
columns of the database. For example, we can 
return all objects that are edible using the query 
7T Object {.^Edible—True RogerSOTl) 


Object 
Bean plants 


Another typical operation on relations is join m that 
combines two relations by merging tuples whose 
common attributes have identical values. Hence, if 
we join the Presidents table 


| Maine 

President 

Starr 

False 

Clinton 

True 


with the Rogerson table using 
7T Name,Object(Presidents cx Rogerson) we get 
a table with the name and object description of 
people that can run for president: 


Name 

Object 

Clinton 

Rich people 


3.1 The SQL way: SELECT statement 


SQL is the defacto standard programming language 
to formulate queries over relational databases. The 
SQL query 


SELECT columns 
FROM tables 
WHERE criteria 


combines selections, projections and joins in one 
powerful primitive. The SELECT clause specifies the 
columns to project, the FROM clause specifies the ta¬ 
bles where the columns are located and the WHERE 
clause specifies which rows in the tables should be 
selected. 

The query 1Tp re sident=False Rogerson is expressed 
in SQL as: 

SELECT * 

FROM Rogerson AS r 

WHERE r.President = FALSE 


The query tt object {pEdibie=True Rogerson ) be¬ 
comes: 


SELECT r.Object 
FROM Rogerson AS r 
WHERE r.Edible = TRUE 


The query 7 TName,Object(Presidents cx Rogerson) 
is expressed as: 

SELECT p.name, r.Object 

FROM Rogerson AS r, Presidents AS p 

WHERE r.President = p.President 


3.2 The VB way: Unstructured strings 


The common way to do query processing from Vi¬ 
sual Basic is to build an unstructured string rep¬ 
resenting the SQL query and submitting that to a 
database server object (we will discuss the ADO ob¬ 
ject model in more detail in section 6.1). So for 
instance, The query op res id en t=Faise Rogerson is 
expressed in Visual Basic as: 


Q = "SELECT *" 

Q = Q & "FROM Rogerson AS r" 

Q = Q & "WHERE r.President = FALSE" 

Set RS = CreateObject("ADODB.Recordset") 
RS.Open Q "Rogerson" 

Do While Wot RS.EOF 
Print RSC'Object") 

Print RS("Edible") 

Print RS("Inheritance") 

Print RS("President") 

RS.MoveWext 
Loop 


3.3 The FP way: Comprehensions 


Within the functional programming community, 
people have argued that (list) comprehensions are 
a good query notation for database programming 
languages [2J. For example using the comprehen¬ 
sion notation supported by Haskell/DB, the query 
opresident=Faise Rogerson can be expressed as: 


do{ r <- table rogerson 
; restrict 






(r!president .==. constant False) 

> 

The query itobject ( &EdMe=True Rogerson) becomes 

do{ r <- table rogerson 

; restrict (r!edible .==. constant True) 

; project (object = r!object) 

} 

Queries that use projections and joins such as the 
following 7 t Name,Object (Presidents m Rogerson ) 
are harder to formulate because we have to indi¬ 
cate explicitly on which common fields to compare 
and how to create the resulting tuple: 

do{ r <- table rogerson 
; p <- table president 

; restrict (r!president .==. p!president) 

; project (name = p!name,object = rlobject) 

i 

The comprehensions are fully typed and automati¬ 
cally translated into correct SQL strings which are 
send to a low-level database server. This paper de¬ 
scribes not only how we did this for SQL but also 
tries to give a general recipe for embedding lan¬ 
guages into a strongly typed language. 

Let’s put the question of embedding SQL aside until 
section 6 and first look how we in general can embed 
languages into Haskell. 


4 Term embedding 

Although SQL is embedded in this specific case, 
there is a general strategy for embedding services 
in a HOT language. We will illustrate this using 
a simple SQL expression service as an example. In 
SQL, expressions are used (amongst others) in the 
search conditions of WHERE clauses to perform com¬ 
putations and comparisons on columns and values. 

4.1 SQL expression server 


Lets assume that the SQL expression server pro¬ 
vides us with the following interface for evaluating 
expressions (described in IDL): 


interface IServer 

{ void SetRequest([in,string] char* expr); 

; void GetResponse([out] char* result); 

} 

Although simple, the IServer interface captures the 
essence of many dynamic services such as a desk 
calculator, finger, HTTP, ftp, NNTP, DNS, ODBC, 
ADO and similar information servers. 

From Haskell, we can access the IServer interface 
using the functions setRequest, and getResponse 
that are automatically generated by our H/Direct 
IDL compiler [6]: 

setRequest :: String -> IServer s -> 10 0 
getResponse :: IServer s -> 10 String 

We are now able to write an evaluator function that 
takes an expression, sends it to the server and re¬ 
turns the result: 

runExpr : : String -> 10 Int 
runExpr = \expr -> 

do{ server <- createObject "Expr.Server" 

; server # setRequest expr 
; result <- server # getResponse 
; return (read result) 

} 

This is essentially the kind of interface that is in 
use now with SQL server protocols as ODBC or 
ADO. An unstructured SQL string is directly sent 
to the server. The problem is that there is noth¬ 
ing that prevents the programmer from sending in¬ 
valid strings to the server, leading to errors at run¬ 
time and/or unpredictable behavior of the server. 
Clearly, this is unacceptable in critical business ap¬ 
plications. 


4.2 Abstract syntax 

To prevent the construction of syntactically incor¬ 
rect expressions, we define an abstract syntax for the 
terms of the input language of the specific server we 
are targeting, together with a “code generator” to 
map abstract syntax trees into the concrete syntax 
of the input language. 

The abstract syntax PrimExpr simply defines literal 
constants, and unary and binary operators. (In sec¬ 
tion 6 we will add row selection): 









data PrimExpr 

= BinExpr BinOp PrimExpr PrimExpr 

I UnExpr UnOp PrimExpr 

I ConstExpr String 

data BinOp 

= OpEq | OpAnd I OpPlus I ... 

Types UnOp and BinOp are just enumerations of the 
unary and binary operators of SQL expressions. 

Writing expressions directly in abstract syntax is 
not very convenient, so we provide combinators to 
make the programmer’s life more comfortable. Each 
SQL operator is represented in Haskell by the same 
operator surrounded with dots. Some definitions 
are 2 : 


constant :: Show a => a -> PrimExpr 

(.+.) :: PrimExpr -> PrimExpr -> PrimExpr 

(.==.) :: PrimExpr -> PrimExpr -> PrimExpr 

(.AND.) :: PrimExpr -> PrimExpr -> PrimExpr 

Now we can write constant 3 .==. constant 
5 instead of the cumbersome BinExpr OpEq 
(ConstExpr (show 3)) (ConstExpr (show 5)). 
This is what embedded domain specific languages 
are all about! 


4.3 Concrete syntax 


In order to evaluate expressions, we must map them 
into the exact concrete syntax that is required by 
the server component. 

The code generator for our expression server is 
straightforward; we print expressions into their fully 
parenthesized concrete representation by a simple 
inductive function: 


pPrimExpr :: PrimExpr -> String 
pPrimExpr = \e -> 
case e of 
{ ConstExpr s 

; UnExpr op x 

-> pUnOp op++parens x 
; BinExpr op x y 

2 The constant function is unsafe since any value in the 
Show class can be used. In the real library we introduce a 
separate class ShowConstant which is only defined on basic 


-> parens x++pBinOp op++parens y 

> 

parens = \x -> "(" ++ pPrimExpr x ++ ")" 

Normally however, this step will be more involved 
as we will see in the SQL example. 


4.4 Embedding expressions 


Now that we know how to construct expressions and 
how to generate code from them, we can rewrite the 
evaluator function to use the structured expressions: 

runExpr : : PrimExpr -> 10 Int 
runExpr = \expr -> 

do{ server <- CreateObject "Expr.Server" 

; server # setRequest (pPrimExpr expr) 

; result <- server # getResponse 
; return (read result) 

i 

We can now use SQL expressions in Haskell as if 
they were built-in. Function runExpr will dynami¬ 
cally compile a PrimExpr program into target code , 
execute that on the expression server and coerce the 
result back into a Haskell integer value: 

sum :: Int -> PrimExpr 
sum = \n -> 
if (n <= 0) 
then (constant 0) 
else (constant n .+. sum (n-1)) 

test = do{ runExpr (sum 10) } 


5 Type embedding 


The above embedding is already superior to con¬ 
structing unstructured string to pass to the server 
because it is impossible to construct syntactically 
incorrect requests. However, it is still possible to 
construct ill typed request, as the following example 
shows: 


do{ let wrong = (constant 3) .AMD. (constant 5) 
; result <- runExpr wrong 
; print result 









Since the PrimExpr data type is completely un¬ 
typed, we have no way to prevent the construction 
of terms such as wrong that might crash the server 
because the operands of the AND expression are not 
of boolean type. 


5.1 Phantom types 


We used abstract syntax trees to ensure that we can 
only generate syntactically correct request, but the 
billion dollar question of course is whether there is 
a similar trick to ensure that we can only generate 
type correct requests. Fortunately, the answer is 
yes! It is possible indeed to add an extra layer on top 
of PrimExpr that effectively serves as a type system 
for the input language of the expression server. 

The trick is to introduce a new polymorphic type 
Expr a such that expr : : Expr a means that 
expr is an expression of type a. The type variable a 
in the definition of the Expr data type is only used 
to hold a type; it does not occur in the right hand 
side of its definition and is therefore never physically 
present: 


data Expr a = Expr PrimExpr 


Now we refine the types of the functions to construct 
values of type Expr a to encode the typing rules for 
expressions: 


constant :: Show a => a -> Expr a 
(.+.) :: Expr Int -> Expr Int -> Expr Int 

(.==.) :: Eq a=> Expr a-> Expr a-> Expr Bool 
(.AND.):: Expr Bool -> Expr Bool-> Expr Bool 


For example, the definition of (.==.) is now: 


(.==.) :: Eq a=> Expr a-> Expr a-> Expr Bool 
(Expr x) .==. (Expr y) 

= Expr (BinExpr OpEq x y) 

By making the Expr type an abstract data type, we 
ensure that only the primitive functions can use the 
unsafe PrimExpr type. If we now use these com- 
binators to write (constant 3) .AND. (constant 
5), the Haskell type-checker will complain that the 
type Expr Int of the operand (constant 2) does 
not match the required type Expr Bool. 


The typing of expressions via phantom type vari¬ 
ables extends immediately to values built using 
Haskell primitives. Our example function sum for 
instance, now has type sum :: Int -> Expr Int. 

Phantom type variables have many other exciting 
uses, for instance in encoding inheritance and typing 
pointers [7]. Later we will show how we use multiple 
phantom type variables to give a type safe encoding 
of attribute selection in records. 


6 Embedding SQL 


Armed with the knowledge of how to safely embed 
a simple language into Haskell, we return to our 
original task of embedding SQL into Haskell. 


6.1 The SQL server 


We will use ActiveX Data Objects (ADO) as our 
SQL server component. ADO is a COM [11] frame¬ 
work that can use any ODBC compliant database; 
MS SQL Server, Oracle, DB/2, MS Access and 
many others. The ADO object model is very rich 
but we will use only a tiny fraction of its function¬ 
ality. 

ADO represents a relation as a RecordSet object. 
It creates a set of records from a query via its Open 
method: 


dispinterface Recordset { 
void Open 

([in,optional] VARIANT Source, 

,[in,optional] VARIANT ActiveConnection, 

,[in,optional] CursorTypeEnum CursorType, 
, [in,optional] LockTypeEnum LockType, 

,[in,optional] long Options 

); 

Bool E0FO; 

void MoveNextO; 

Fields* GetFields 0; 

} 


The first argument of the Open method is the source 
of the recordset, which can be an SQL string or the 
name of a table or a stored procedure. The second 
argument of the Open method can be a connection 
string, in which case a new connection is made to 






create the recordset, or it can be a Connection ob¬ 
ject that we have created earlier. In this paper, we 
will not use the other (optional) arguments of the 
Open method, hence we provide the following signa¬ 
ture for open: 


open :: (VARIANT src, VARIANT actConn) => 
src -> actConnn -> IRecordSet r -> 10 () 


The MoveNext, EOF and GetFields methods allow 
us to navigate through the recordset. Their Haskell 
signatures are: 

moveNext :: IRecordSet r -> 10 () 

eof :: IRecordSet r -> 10 Bool 

getFields :: IRecordSet r -> 10 (IFields ()) 

The Fields interface gives access to all the fields of 

a row, they can be accessed either by position or by 
name: 


dispinterface Fields { 
long GetCountQ; 

Field* Getltem([in] VARIANT Index); 

}; 


Each Field object corresponds to a column in the 
Recordset: 


dispinterface Field { 
VARIANT GetValue(); 
BSTR GetNameO; 

} 


The GetValue property of a Field object can be 
used to obtain the value of a column in the current 
row. The GetName property returns the name of the 
field: 


getValue : : VARIANT a => IField f -> 10 a 
getName :: IField f -> 10 String 

Although the ADO object model is somewhat more 
refined than the expression evaluator example we 
have seen earlier, it does still fits the basic client- 
server framework. Requests are submitted via the 
Open method, and responses are inspected by iterat¬ 
ing over the individual Field objects of the Fields 
collection. 


6.2 Using the RecordSet in Haskell 


In Haskell, we would like to abstract from iterat¬ 
ing through the record set and access the result of 
performing a query as a list of fields. This faces us 
with the choice of either returning this list eagerly, 
or creating it lazily. In the former case, all fields are 
read into a list at once. In the latter case, the fields 
are encapsulated in a lazy stream where a field is 
read by demand. 

Both functions are defined in terms of the function 
readFields that takes an 10-action transformer 
function as an additional argument: 

readFields :: (10 a -> 10 a) 

-> IRecordSet r -> 10 [IFields ()] 
readFields = \perform -> \records -> perform $ 
do{ atEOF <- records # eof 
; if atEOF 

then do{ return [] } 
else do{ field <- records# etFields 
; records#moveNext 
; rest <- rs#readFields perform 
; return ([field] ++ rest) 

} 

| 

By taking perform to be the identity, we get a 
function that reads the list of fields strictly, by 
taking perform to be the IO-delaying function 
unsafelnterleavelO we obtain a function that 
reads the list of fields lazily. 

A simple query evaluator can now be written as: 

runQuery :: String -> 10 [IFields ()] 
runQuery = \query -> 

do{ records <- createObject "ADO.RecordSet" 

; records # open query Nothing 
; fields <- records # readFields id 
; return fields 
> 

Of course, this approach suffers from all the weak¬ 
nesses described in section 4. 


6.3 Abstract syntax 


Just as in the previous example we will define and 
abstract syntax for expressing database operations. 








Our language for expressing those operations will be 
the relational algebra. The code generator will take 
these expressions and translate them to the concrete 
syntax of SQL statements which preserve the se¬ 
mantics of the original expression. 

The abstract syntax for the relation algebra be¬ 
comes 3 : 

type TableWame = String 

type Attribute = String 

type Scheme = [Attribute] 

type Assoc = [(Attribute,PrimExpr)] 

data PrimQuery 

= BaseTable TableWame Scheme 
I Project Assoc PrimQuery 
I Restrict PrimExpr PrimQuery 
I Binary RelOp PrimQuery PrimQuery 
I Empty 

data RelOp 

= Times I Union I Intersect 
I Divide I Difference 

data PrimExpr 

= AttrExpr Attribute 
I ConstExpr String 

I BinExpr BinOp PrimExpr PrimExpr 

I UnExpr UnOp PrimExpr 

For example, the relational expression 
that returns all objects that are edible: 
7T Object (PEdibie=True Rogerson ) can be expressed 
in our abstract syntax as: 

Project [("Object".AttrExpr "Object")] 
(Restrict (BinExpr OpEq (AttrExpr "Edible") 
(ConstExpr "True")) 

(BaseTable "Rogerson" 

[ "Object", "Edible" 

, "Inheritance", "President" 

] 

)) 


6.4 Concrete syntax 


It is straightforward to generate concrete SQL state¬ 
ments from the PrimQuery data type, allthough spe¬ 
cial care has to be taken to preserve the correct se¬ 
mantics of the relational algebra due to the idiosyn¬ 
crasies of SQL. The use of the relational algebra as 

3 The Project constructor actually does both projection 
and renaming. 


an intermediate language allows us to target a wide 
range of different database languages. We are plan¬ 
ning to add bindings to other dialects of SQL and 
languages as ASN.l. 

Besides being portable, the simple semantics of the 
relational algebra allows us to perform a powerful 
set of optimizations quite easily before transform¬ 
ing the expression to concrete syntax. Many times 
the SQL server is not capable of doing these trans¬ 
formations due to the complex semantics of SQL. 
Another benefit is that we can add operations like 
table comparisons which are very hard to express 
in languages like SQL, but easy to generate from a 
relational expression. 


6.5 Towards comprehensions 


We could proceed as in our earlier example and de¬ 
fine some friendly combinators for specifying rela¬ 
tional expressions as we did in our previous example. 
However there is a serious drawback to using rela¬ 
tional expressions directly as our programming lan¬ 
guage. In the relational algebra, attributes are only 
specified by their name. There is no separate bind¬ 
ing mechanism to distinguish attributes from differ¬ 
ent tables. Suppose we take the cartesian product 
of a relation with itself. In SQL we could write: 


SELECT X.Name, X.Mark 
FROM Students As X, Students As Y 
WHERE X.Mark = Y.Mark 
AND X.Name <> Y.Name 

But in the relational algebra, we are unable to do 
this since there are common common attributes like 
Name and City which lead to ambiguity. To take 
the cartesian product, one relation needs to rename 
those attributes. The join (m) operator is espe¬ 
cially introduced to make it easier to specify the 
most common products where renaming would be 
necessary. Besides only covering the most common 
cases, it is notoriously hard to typecheck join ex¬ 
pressions [3] and we haven’t found a way to embed 
those typing rules within Haskell. 

However, why not use the same approach as SQL? 
We will introduce a binding mechanism (monad 
comprehensions) for qualifying relations. Instead of 
identifying attributes just by name, we will use both 
a name and relation. The above query is formulated 
in Haskell/DB as: 







do{ x <- table students 
; y <- table students 
; restrict (xImark .==. yI mark) 

; restrict (xlname .<>. yI name) 

; project (name = xlname, grade = x!grade) 


Under the hood, we still generate relational algebra 
expressions but all the renaming is done automat¬ 
ically within the combinators. Besides automatic 
renaming, we would like the Haskell type-checker 
to prevent us from writing silly queries such as this 
one where we ask to project the non-existing city 
attribute of a student: 

do{ x <- table students 

; project (name = xlname, city = xlcity) 


We will present two designs for implementing com¬ 
prehensions that are increasingly more type safe, 
but at the same time increasingly complex. 

6.6 Attempt 1: Untyped comprehen- 


The first attempt only hides the automatic renam¬ 
ing of attribute names, making this solution already 
much safer and convenient than writing abstract 
syntax directly. In our next attempt we will use 
phantom types to make queries more type safe. We 
defined the Query monad to express our queries. 
The use of a monad gives us the following advan¬ 
tages: 

• The do notation provides a nice syntax to write 
queries (comprehensions). 

• Monads enable a custom binding mechanism (ie 
dofx <- table X; ...>). to qualify names. 
An alternative approach of explicitly renaming 
attributes would be too cumbersome to use in 
practice. 

• An invisible state can be maintained. The 
state of the Query monad contains the (par¬ 
tially) completed relational expression and a 
fresh name supply for automatic renaming of 
attributes. 

Our query language now consists of three basic com¬ 
binators, restrict, project and table, and the 


two monad operations 4 returnQ and bindQ for the 
Query monad. Besides that we have the usual bi¬ 
nary combinators like union: 

type State = (PrimQuery.FreshNames) 
data Query a = Query (State -> (a,State)) 

returnQ : : a -> Query a 

bindQ :: Query a -> (a -> Query b) -> Query b 

restrict :: Expr Bool -> Query 0 
project :: Rec r -> Query Rel 
table :: Table -> Query Rel 

union :: Query Rel -> Query Rel -> Query Rel 

The exact details of doing correct renaming for all 
attributes are rather subtle and a thorough discus¬ 
sion is outside the scope of this paper. We will pro¬ 
vide all the details in a separate report [IS], The 
Rec r and Rel types are explained in the next sec¬ 
tion where we add typed layer on top of the com¬ 
prehension language. 


6.7 Attempt 2: Typed comprehensions 


We already know how to make the expression sub¬ 
language type safe using phantom types. For the 
comprehension language we will use the same trick. 
Central to this discussion is the attribute selection 
operator: 

(!) :: Rel -> Attribute -> PrimExpr 

Given a relation and an attribute name, the oper¬ 
ator returns the attribute value expression. Given 
that any attribute always has a well defined type, 
we parametrize the attribute by its type to return 
an expression of the same type: 

data Attr a = Attr Attribute 

(!) :: Rel -> Attr a -> Expr a 

Although we can now only use an attribute expres¬ 
sion at its right type, the system doesn’t prevent us 
from selecting non-existent attributes from the re¬ 
lation. The solution is to parametrize the Rel type 
by the its “scheme”. Similarly, we parametrize the 
Attr type again by both the scheme of the relation 
and the type of the attribute: 





data Rel r = Rel Scheme 

data Table r = Table TableName Scheme 

data Attr r a = Attr Attribute 

The Rel and Table both retain their associated 
scheme. This is needed to read the concrete values 
returned by the actual query. 

The selection operator (!) now expresses in its type 
that given a relation with scheme r that has an at¬ 
tribute of type a, it returns an expression of type a. 
The polymorphic types of the other basic combina- 
tors should not be too surprising: 

(!) :: Rel r -> Attr r a -> Expr a 

restrict :: Expr Bool -> Query () 
project :: Rec r -> Query (Rel r) 
table :: Table r -> Query (Rel r) 

Our desire to guarantee type safety bears some 
additional cost on the user. For every attribute 
attr that occurs in a query, we have to define an 
attribute definition attr : : r \attr => Attr ( attr 
:: a I r) a by hand, until TREX will provide first 
class labels. Similarly, for every base table with 
scheme r that we use, we need a definition of type 
Table r. For the example database of section 7 we 
have: 

students :: Table(name :: String, mark :: Char) 
name :: r\name => 

Attr (name :: String I r) String 
mark :: r\mark => 

Attr (mark :: Char I r) Char 

We have written a tool called DB/Direct that 
queries the system tables and automatically gener¬ 
ates a suitable database definition. This tool is of 
course written using Haskell/DB. 

The Haskell type-checker now checks the consis¬ 
tency of our queries. It accepts query passed 
without problems, but it fails to type check query 
failed because the condition student !mark .<=. 
constant 5 wrongly attempts to compare a char¬ 
acter to an integer and because the programmer ac¬ 
cidentally used the attribute ame instead of name: 

passed :: Query (Rel (name :: String)) 
passed = 

do{ student <- table students 

; restrict (student!mark .>=. constant ’B’) 

; project (name = student!name) 


} 

failed :: Query (Rel (name :: String)) 
failed = 

do{ student <- table student 

; restrict (student!mark .<=. constant 5) 
; project (name = student lame) 

} 


7 Exam marks 


Any commercial exploitation of the web today uses 
server-side scripts that connect to a database and 
deliver an HTML page composed from dynamic 
data obtained from querying the database using in¬ 
formation in the client’s request. The following ex¬ 
ample is a is a simple server-side web script that 
generates an HTML page for a database of exam 
marks and student names. 

The database is accessed via simple web page with 
a text entry and a submit button: 

My name is: 


Show my mark | 


The underlying HTML has a form element that sub¬ 
mits the query to the getMark script on the server. 


<HTML> 

<HEAD> <TITLE>Find my mark</TITLE> </HEAD> 
<B0DY> 

<F0RM ACTION="getMark.asp" METHOD= M post"> 

My name is: 

<INPUT TYPE=text NAME="name"> 

<INPUT TYPE="submit" VALUE="Show my mark"> 
</F0RM> 

</B0DY> 

</HTML> 


7.1 Visual Basic 


Even the simplest Visual Basic solution uses no less 
than four different languages. Visual Basic for the 
business logic and glue, SQL for the query, and 
HTML with ASP directives to generate the result 
page. 






1. In ASP pages, scripts are separated from the 
rest of the document by <"/, and “/,> tags. The 
prelude script declares all variables, construct 
the query and retrieves the results from the 
students database. The ASP Request ob¬ 
ject contains the information passed by the 
client to the server. The Form collection con¬ 
tains all the form-variables passed using a POST 
query. Hence Request. Form("name") returns 
the value that the user typed into the name 
textfield of the above HTML page. 

<'/. 

Q = "SELECT student.name, student.mark" 

Q = Q & " FROM Students AS student" 

Q = Q & " WHERE "student.name = " 

Q = Q & Request.Form("name") 

Set RS = CreateObject("ADO.Recordset") 

RS.Open Q "CS101" 

'/.> 

2. The body contains the actual HTML that is 
returned to the client, with a table containing 
the student’s name and mark. The <"/,= and '/„> 
tags enclose Visual Basic expressions that are 
included in the output text. Thus the snippet: 

<TR> 

<TDX , /.=RS ("name") , / 1 ></TD> 

<TDX'/,=RS ("mark") , /.X/TD> 

</TR> 

creates a table row that contains the name and 
the mark of the student who made the request: 

<HTML> 

<HEAD> <TITLE>Marks</TITLE> </HEAD> 

<B0DY> 

CTABLE B0RDER="1"> 

<TR> 

<TH>Name</TH> 

<TH>Mark</TH> 

<TR> 

<y,Do While Wot RS.E0F’/.> 

<TR> 

<TDX'/,=RS ( "name " ) , /.X/TD> 

<TDX , /.=RS ( "mark" ) , /.X/TD> 

<TR> 

<’/,RS. MoveWext’/,> 

<'/,Loop'/.> 

</TABLE> 

</B0DY> 

</HTML> 

3. The clean-up phase disconnects the databases 
and releases the recordset: 


<’/. 

RS.Close 

set RS = Nothing 
'/> 


7.2 Haskell 


The Haskell version of our example web page is more 
coherent than the Visual Basic version. Instead of 
four different languages, we need only need Haskell 
embedded in a minimal ASP page [14]: 

<7.0 LANGUAGE=HaskellScript '/.> 

<*/. 

module Main where 

import Asp 

import HtmlWizard 

main :: 10 0 

main = wrapper $ \request -> 

do{ name <- request # lookup "name" 

; r <- runQuery (queryMark name) "CS101" 

; return (markPage r) 

} 


The function queryMark is the analog of code in the 
prelude part of the Visual Basic page, except here 
it is defined as a separate function parametrized on 
the name of the student: 


type Student = Row(name :: String, mark :: Char) 

queryMark : : String -> Query Student 
queryMark = \n -> 

do{ student <- table students 

; restrict (student!name .==. lift n) 

; project 

( name = student!name 
, mark = student!mark 
) 

} 

Function markPage makes a nice HTML page from 
the result of performing the query: 


markPage :: [ Student ] 
markPage = \rs -> 
page "Marks" 

[ table 

( headers = [ "Name", "Mark"] 
, rows = [[rlname, rlmark:""] 




The Haskell program is more concise and more 
modular than the Visual Basic version. Functions 
queryMark and markPage can be tested separately, 
and perhaps even more importantly, we can easily 
reuse the complete program to run in a traditional 
CGI-based environment, by importing the CGI mod¬ 
ule instead of Asp (in a language such as Standard 
ML we would have parametrized over the server in¬ 
terface) . 


8 Status and conclusions 


The main lesson of this paper is a new design princi¬ 
ple for embedding domain specific languages where 
embedded programs are compiled on-the-fly and ex¬ 
ecuted by submitting the target code to a server 
component for execution. We have shown how to 
embed SQL into Haskell using this principle, but 
there are numerous other possible application do¬ 
mains where embedded compilers are the implemen¬ 
tation technology of choice; many Unix services are 
accessible using a completely text-based protocol 
over sockets. 

Traditionally, domain abstractions are available 
as external libraries. For instance, the JMAIL 
component (available at the time of writing at 
http://www.dimac.net/) provides a plethora of 
methods to compose email messages, to show just a 
few: 


dispinterface ISMTPMail { 

VARIANT_B00L Execute(); 

void AddRecipient([in] BSTR Email); 

[propget] BSTR Sender(); 

[propput] void Sender([in] BSTR rhs); 
[propget] BSTR Subject 0; 

[propput] void Subject([in] BSTR rhs); 
[propget] BSTR BodyO; 

[propput] void Body([in] BSTR rhs); 

}; 

Instead of providing a whole bunch of methods to 
construct an email message in an imperative style, 
an alternative approach would be to have a raw 
(SMTP) mail server [16] component that accepts 


email messages in the RFC822 format [4] directly 
together with a set of combinators to build email 
messages in a compositional style. Ultimately, these 
abstract email messages are “compiled” into raw 
strings that are submitted to the mail server, per¬ 
haps by piping into the appropriate telnet port. 

Our ultimate goal for a Domain Specific Embedded 
Compiler is to provide hard compile-time guaran¬ 
tees for type safety and syntactical correctness of 
the generated target program. Syntactical correct¬ 
ness of target programs can be garantueed by hiding 
the construction of programs behind abstract data 
types. Phantom types, polymorphic types whose 
type parameter is only used at compile-time but 
whose values never carry any value of the param¬ 
eter type, are a very elegant mechanism to impose 
the Haskell type system on the embedded language. 

Our final example shows how Domain Specific Em¬ 
bedded Compilers can make server-side web script¬ 
ing more productive. Because we can leverage on 
the abstraction mechanisms of Haskell (higher-order 
functions, module system), compared to the VB so¬ 
lution, the Haskell program is of higher quality, and 
easier to change and maintain. The formulation 
of queries using the doO notation and extensible 
records is rather neat, but the exact translation into 
SQL turned out to be rather subtle. 

Both the Haskell/DB and the DB/Direct pack¬ 
ages are available on the web at the URL 
http://www.haskell.org/haskellDB. 
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